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ABSTRACT 


Despite the prevalence of voting systems in the real world there is 
no consensus among researchers of how people vote strategically, 
even in simple voting settings. This paper addresses this gap by com- 
paring different approaches that have been used to model strategic 
voting, including expected utility maximization, heuristic decision- 
making, and bounded rationality models. The models are applied 
to data collected from hundreds of people in controlled voting ex- 
periments, where people vote after observing non-binding poll 
information. We introduce a new voting model, the Attainability- 
Utility (AU) heuristic, which weighs the popularity of a candidate 
according to the poll, with the utility of the candidate to the voter. 
We argue that the AU model is cognitively plausible, and show that 
it is able to predict people’s voting behavior significantly better 
than other models from the literature. It was almost at par with (and 
sometimes better than) a machine learning algorithm that uses sub- 
stantially more information. Our results provide new insights into 
the strategic considerations of voters, that undermine the prevalent 
assumptions of much theoretical work in social choice. 
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1 INTRODUCTION 


Voting is a commonplace tool for group decision making, used in 
political elections, in professional committees, in local assemblies, 
and also in online platforms such as Doodle.com and robovote.org. 
While there is general consensus that people vote strategically, 
understanding individual voting behavior is a challenging open 
question. Due to inherent uncertainty about other people’s votes, 
the strategies that people apply are far from obvious. 

Researchers in economics, political science, and more recently in 
Al and computational social choice, have suggested various models 
to represent and reason about voters’ decision making under un- 
certainty. These include models of utility maximization, heuristic, 
and bounded rational (see below). In a recent paper, Meir, Lev and 
Rosenschein [19] suggested different criteria for evaluating models 
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of strategic voting, that included “theoretic criteria” (such as gener- 
ality and discriminative power among actions), “behavioral criteria” 
(such as cognitive plausibility), and “scientific criteria” (such as 
alignment with empirical data). Theoretical analysis of voting mod- 
els are abundant (see Sec. 2.1), and in this work we focus on the 
latter two kinds of criteria. 


Research Goal. The goal of the paper is to study strategic choices 
of human voters, and in particular to test how their individual 
behavior fits different types of models. We use real world data 
from controlled experiments in which human voters either faced a 
non-binding poll or played a strategic game versus other people. 

We follow Wright and Leyton-Brown [33], who separate col- 
lected data from strategic games into training and test data, and 
compared the predictive power of strategic decision-making models 
based on their predictive performance on the test data. If a certain 
model predicts well the behavior of many voters, this is an im- 
portant indication for the plausibility of this model.Prediction is a 
standard evaluation metric in behavioral economics [8, 12]. Thus it 
should be considered in addition to its theoretical properties, cogni- 
tive limitations of the voters and so on. By further analyzing which 
models succeed and when they fail, we hope to better understand 
the considerations that guide people’s strategic choices. 


1.1 Theoretical Models 


We briefly describe common approaches for modeling strategic 
voting behavior in the theoretical literature. 


Expected utility maximization. A rational voter maximizes her 
expected utility with respect to a probability distribution over the 
actions of the other voters. The distribution itself may be given 
exogenously (e.g., by a poll), or derived via equilibrium analysis 
from the uncertain preferences of the other voters. Such models 
were developed mainly in the economics literature and are some- 
times known as the “calculus of voting” [20, 21, 27]. A somewhat 
different model was suggested by Bowman et al. [6] for voting on 
multiple binary issues. The model explicitly estimates the “attain- 
ability” of each issue (the probability it gets a majority of the votes), 
and uses this estimate to calculate the expected utility of every 
possible combinatorial vote. 


Heuristic decision-making. A voter uses some function that 
maps any given situation to an action. The voter is not assumed to 
be rational, and may not even have a cardinal utility measure or an 
explicit probabilistic representation of the different outcomes. For 
example, a voter following the k-pragmatist heuristics behaves as 
if only the k leading candidates are participating [26]. 


Bounded rationality. A voter makes a rational strategic decision 
based on a subjective, rather than accurate, belief. These models 
present a mid-point between utility maximization and heuristics. 
One example of such a model is local dominance [19], which assumes 


that each voter derives a set of possible outcomes based on a poll, 
and then selects a non-dominated action within these outcomes. 
Similar probability-free approaches were followed in [11, 22]. 


All the models we work with assume that individual voters may 
behave differently, but each one follows a deterministic, consistent 
voting strategy. They are unable to perfectly explain or predict votes 
that have a random component or where voter behavior changes 
over time. Therefore, if the models can still explain the data, it 
would mean that noise and learning only play a secondary role in 
people’s voting decisions. We go back to this point when analyzing 
the results. 

Other models, such as quantal-response equilibrium (QRE) [16, 
17], sampling equilibrium [24] or trembling hand equilibrium [23] 
assume voters act stochastically. Evaluation of such models is much 
easier on the aggregate rather than individual level (see below), and 
therefore they are outside the scope of the current paper. 


1.2 Previous Empirical Work 


While the literature is abundant with voting experiments, the vast 
majority analyze voter behavior in political or organizational elec- 
tions [1, 5, 13, 25, 31]. These studies test how well historic election 
results fit various game-theoretic models, without any considera- 
tion of individual votes. Further, each voter makes a single strategic 
decision, and her true preferences are typically unknown. 


Explaining aggregate voting behavior. Some controlled experi- 
ments track voters’ decisions in different situations (e.g., [14, 30, 32]). 
Most of these experiments included groups of 12-70 subjects who 
played a repeated strategic voting game, knowing the preferences of 
others but not how they are going to vote. Yet these papers focused 
on how well aggregate behavior fits the equilibrium models. For 
example, in the QRE model used in Tyszler and Schram [30], voters 
are assumed to vote for low utility candidates with some probabil- 
ity, which is determined by a parameter of the model. A model is 
considered to be an adequate explanation for a dataset if there are 
some parameters that result in a similar distribution of votes (e.g., 
a similar rate of strategic compromise) to the one observed in the 
data. However, such models are not designed to track individual 
behavior and whether it is consistent. For example, a 20% rate of 
strategic compromise could result either from a small group that is 
consistently strategic, from all voters being occasionally strategic, 
or even from some random component in the behavior. 


Explaining individual voting behavior. Blaise et al. compare 
individual behavior to rational models, inferring voters’ parameters 
from verbal surveys [5] or from carefully designing the conditions 
of a controlled experiment [4]. However, they focused on voting 
with two candidates, where the only strategic decision is whether 
to vote or abstain. 

We emphasize that all of the above work tested how well empir- 
ical data fits the theory in retrospect, without dividing the data into 
separate training and test sets. This approach may cause overfitting, 
especially in complex models with many parameters. 

Tal et al. [28] study voter behavior under poll information, but 
did not compare to any existing decision model, neither suggested 
a new one. They demonstrated empirically that there are voters 
exhibiting different behaviors: in particular, truthful voters, voters 


who compromise strategically, and voters that tend to vote for 
the poll leader (“leader biased”). These findings were part of our 
inspiration to focus on understanding individual votes. 

Our contributions and results are as follows: 


(1) We provide a new voting model called the Attainability- 
Utility (AU) heuristic. The model is based on the model sug- 
gested by Bowman et al. [6] which considers the attainability 
of a binary issue (the probability it is accepted in referen- 
dum) when computing the expected benefit of a vote. The 
AU heuristic extends this model in two ways, first by consid- 
ering multi-candidate voting settings, second by including 
a parameter that measures the tradeoff between how much 
the voter values candidates’ attainability given the poll in- 
formation versus their utility (if selected). 

(2) We collect the strategic decisions of 520 people in voting 
experiments with three candidates, where participants each 
play up to 36 rounds, each round with different poll infor- 
mation and preferences (more than 14,000 decisions in total). 
All of the data and code will become available for the public 
using repositories such as votelib.org. 

(3) Using behavioral data from our experiments as well as from 
Tal et al. [28] and Tyszler and Schram [30], we compare 
the performance of the AU approach to that of theoretical 
decision models from the literature and to benchmarks set 
by off-the-shelf machine learning algorithms. 

(4) Our results show that the AU model outperforms all other 
voting models, some of them by a large margin, and gets 
close to the benchmark set by machine learning algorithms. 
In particular, AU is able to capture much of the behaviors 
described by the models in Sec. 2.1. Most errors in the predic- 
tion of the AU model can be attributed to participants who 
played few number of rounds, demonstrated random behav- 
ior, and/or changed their strategy during the experiment. 

(5) Our main insight from the success of the AU model is that 
people independently evaluate each candidate and use simple 
substitutes to probabilistic calculations. These findings are 
in line with the more general research on decision making 
under uncertainty, and at odds with the underlying assump- 
tions of most models from the social choice literature. 


This is the first paper to provide an empirical evaluation of theo- 
retical decision making models on individual voter behavior under 
poll information, and the first to test the predictive performance of 
any voting model in general. Understanding the strategic decisions 
made by voters of different types, is crucial to the development, 
analysis and application of voting rules in strategic environments, 
and can inform the design of agents for making voting decisions 
with other people [3, 34]. 


2 PRELIMINARIES 


We consider a single voter who faces a decision, to vote for one of 
several candidates C. We use the Plurality rule which collects the to- 
tal number of votes for each candidate, and returns the candidate(s) 
with the largest number of votes. 

The voter has a cardinal utility function u : C > R, where u(c) 
is the utility of the voter if candidate c wins (different utility for 
each candidate). In case of a tie with multiple winners W C C, the 


Decision model | vote 

= KP, kK =2 4 

» 100 KP,k=4 Nn 

2 80 CV, 7 =8 q2 

es CV, 4 = 10000 | qa 

8 LD, r = 0.01 qn 
= 40 - 

E LD, r=0.08 q2 

20 LDLB, r = 0.01 q4 

(0) LDLB, r = 0.08 qQ2 


qi(40) q2(30) q3(20) q4(10) q5(0) 


Figure 1: Left: Poll s from Example 2.1. The utility of each 
candidate to the voter appears in brackets, and the height 
of the column is the number of votes. Right: The candidate 
selected by each decision model. 

utility to the voter is u(W) = WI dcew u(c). Denote by U(C) the 
set of all utility functions over the set C. 

Prior to her vote, the voter is faced with non-binding poll infor- 
mation that reflects the popularity of each candidate. Formally, the 
poll is a vector s € N”, where s(c) is the number of voters expected 
to vote for c. Denote n = )i¢ec s(c). 

We index the candidates q1, q2,... from the perspective of the 
voter, where qj is the most preferred, then q2, and so on. 


A decision model (for Plurality with m candidates and a poll) is 
a function M : U(C) x N™ — C. Here, M(u,s) € C is the vote 
of a voter with utility function u, using decision model M given 
a poll s. We use a superscript for the name of the decision model, 
and subscripts to denote voter-specific parameters, if relevant. For 
example, a voter who is always truthful regardless of the poll follows 
the decision model M'™"th(u, s) := arg max,¢c u(c), which is q1. 

To illustrate we introduce a running example with 5 candidates, 
and specify which candidate the voter will choose under every 
decision model. 


Example 2.1. The set of candidates is C = {q1,...,q5}, and the 
voter’s utility is described by the vector u = (40, 30, 20, 10, 0) (prefer- 


ences are lexicographic). Poll scores are given by s = (25, 70, 20, 100, 80), 


where n = 295 voters. Figure 1 (left) shows the poll scores of all 
candidates graphically. 


2.1 Decision models from the literature 


We describe decision-making models of voting behavior from the 
literature. Figure 1 (right) shows the result of using each of the 
decision models, applied to the voting decision in Example 2.1. 


k-pragmatist (KP): Let B,(s) contain the k candidates with high- 
est score in s, Reijngoud and Endriss [26] formalized the k-pragmatist 
heuristic (following early work such as [7] which selects the most 
preferred candidate among k candidates with highest score in B;,(s): 
Meu, Ss) := arg max u(c). 
ceB (Ss) 
We allow k to be an individual parameter that differs from voter 
to voter. When k = 1 the rule always selects the leader of the poll, 
and for k = m, MXP = M!ruth Ty Figure 1 for k = 2 the voter will 
vote for the candidate that is most preferred among the two leading 
candidates (q4 and qs). For k = 4, the voter considers all candidates 


except q3 as possible winners, and will vote for her most preferred 
candidate qj. 


Calculus of Voting (CV): The calculus of voting suggests that a 
rational voter always votes in a way that maximizes her expected 
utility [21, 27]. The complications of the model usually arise from 
the fact that the voter is assumed to know the other voters’ pref- 
erences, and uses an equilibrium model to predict their votes. We 
consider a simpler version where the distribution of votes is given 
exogenously [20], as is the case with poll information. 

We denote by D(s) the distribution on the actual candidate 
scores, conditional on poll scores s. We say a voter is pivotal for 
candidate y over x, if voting for y makes y a joint or unique winner, 
whereas any other vote results in the victory of candidate x. Denote 
by Ps, p(x, y) the probability that the voter is pivotal for y over x 
given the distribution over candidate scores D induced by poll s. A 
voter following the calculus of voting (CV) model maximizes her 
expected utility: 


MS (u, S$) := arg max >» Ps p(c’, c)(u(c) — u(c’)). 
ceC  oree 

To make the model concrete, we determine a specific distribution 
D ina way that depends on the score of the candidates in the poll 
s. We use Ps» as a shorthand for P, 4 when D(s) is a multinomial 
distribution with 7 voters, and the probability for sampling a vote 
for each candidate c is s(c)/n. When y = n (i.e., the true number 
of voters), this means that M;” selects the candidate that exactly 
maximizes the voter’s expected utility given the true distribution 
over candidate scores. However, the M; decision model allows for 
a more flexible, bounded-rational decision: when y < n the voter 
overestimates her true pivot probability, and thus her influence on 
the outcome, whereas 7 > n means that she underestimates her 
influence. In Figure 1 when 7 = 10000 the resulting vote is gq and 
when 7 = 8 the resulting vote is qo. 


Local Dominance (LD): Under the Local dominance model [18, 
19], a bounded-rational voter has an ‘uncertainty parameter’ r. Meir 
et al. [19] characterize the set of undominated candidates U(s, u, r) 
in poll s for a voter with utility u and parameter r: 


e The set of Possible Winners PW includes all candidates 
whose score in s is at least max¢ec s(c) — 2r-n. 

e If |PW| > 2, then the undominated candidates are all candi- 
dates in PW except the least preferred. 

e If |PW| = 1, then all candidates are undominated. 


The decision model of such a voter selects the most preferred 

undominated candidate, if more than one exists: 
M(u, s) <= argmax u(c). 
ceU(s,u,r) 

In Figure 1 we see that for r = 0.01 the voter believes that the 
poll is very accurate (the score of each candidate may change by 
at most r-n < 3 votes), and there is only one possible winner 
(PW = {qq}). In this case, all candidates are undominated and the 
voter remains truthful (M5), (us s) = qi). When r = 0.08, the voter 
believes that the poll is not very accurate and PW = {q2, q4,q5}. In 
such a case both qe2, qa are undominated and Me (u,s) = qo. 


Local-Dominance with Leader bias (LDLB): Inspired by the 
findings of Tal et al. [28] on “leader bias”, we modify the local 
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Figure 2: The attainability Ag for different values of f in a 
poll with 3 candidates. 


dominance model to allow such behavior: when the voter is certain 
that there is only one possible winner (|PW| = 1), she simply votes 
for the leader (instead of truthfully), i-e., MEPLB(y, s):= M?(u, s), 
and otherwise Mb°l®(u,s) := PW. In Figure 1 we see that this 
model acts similarly to the LD model. However when there is only 
one possible winner, a voter following the LDLB model will vote 
for the leader (d in this case). 


Attainability (AT): Bowman et al. [6] provide a model for voting 
over multiple binary issues. The attainability of issue j is a measure 
of certainty that the eventual number of votes cast for j will reach 
the majority threshold required for approval. It is defined as 


Ap(,s) = * atetan(B (5) ~ 3) 5, 


where s; is the expected number of votes in favor of issue j, and B 
is a voter-specific parameter. 

The “candidates” considered in Bowman et al.[6] are all possible 
subsets of issues, i.e., C = gtl..k}. where w.l.o.g. the voter gains 
some nonnegative utility u; from each issue j being approved. Then 
the utility of a candidate c C {1,...,k} is the sum of utilities of all 
issues in c, and its attainability A p(c, s) is defined as the product of 
Agi. s) for all j € c, and 1— Api. s) for all j ¢ c. The voter selects 
the candidate c that maximizes the product of its attainability and 
utility (Ap(c, s)-u(c)). 

To adapt the decision model to Plurality voting with m candi- 


dates, we re-define the attainability function as Ag(c, s) := 2 arctan(p- 


(s(c) — +)) + 4, and define the attainability choice function (AT) as 


Mg (u, s):= ae aren, s)-u(c). 


Figure 2 shows how f affects the attainability score. Candidates 
that are tied have the same attainability. As shown by the figure, 
high £ means that a small advantage in score translates to a large 
gap in attainability. 


3 THE ATTAINABILITY-UTILITY (AU) 
MODEL 


Bowman et al.'s AT model allows voters some flexibility in how they 
estimate attainability using the f parameter. However it assumes 
the same model for each voter. We extend the attainability model 
by an additional parameter that lets each voter choose a different 
tradeoff between the attainability and utility of candidates. To this 
end we define the Attainability-Utility (AU) decision rule as 


Mau = % . (An(c, 2-a@ ; 
a as ((e +u(c))® - (Ag(c, s)) ) 


Figure 3: The AU scores of all five candidates from Exam- 
ple 2.1 for f = 5 and different values of a. 


where € is a small constant added to handle 0 utility (¢ can also be 
used as a parameter to control the utility range). 

Intuitively, the a parameter trades-off the relative importance of 
attainability and utility, where a = 0 means the voter always selects 
the candidate with maximal score, and a = 2 means the voter is 
always truthful. Figure 3 shows how the relative score (and the 
selected candidate) changes as we increase a. When a is small, AU 
will prefer qq as it has more votes (higher attainability) and when 
it is large the AU will prefer q1 as it got higher utility. Note that we 
get the AT model as a special case when setting a = 1,¢ = 0. We 
further discuss the meaning of these parameters in Sec. 6.1. 


4 METHODOLOGY 


Datasets. We evaluated the different models described above on 
several datasets as follows. 


D32 D36 TMG15 TS16 
# participants 187 335 437 144 
# voters in poll 1000 1000 8to 10000 812 
# rounds up to32 upto36 up to 20 40 
# instances 4886 9478 8011 5760 


Three of the datasets (D32, D36 and TMG15) were collected using 
the framework of Tal et al. [28], in which voters played multiple 
one-shot voting rounds. A snapshot of the GUI used for this setting 
is shown in Figure 4. 

Each round included a single human participant, that is auto- 
matically assigned preferences over candidates, observes a noisy 
“poll” with the expected votes of the entire population (e.g., 1000 
voters), and then votes once. 

The outcome of the round was generated by sampling each of 
the other votes i.i.d using the poll scores as the distribution (e.g. in 
Fig. 4 we sample 102 “voters", each of which votes Blue w.p. x ; 
Participants were only informed that the poll was inaccurate, but 
not on the exact distribution. The final score of each candidate and 
the outcome were shown in the end of each round. In all datasets 
the reward for participants was determined by the position of the 
winning candidate in their preferences, using the average reward if 
there was more than one winning candidate. 


VOTE NOW! 


There are 103 players, you are pl ) 


1st priority 2nd priority 3rd priority 
20 Coins 10 Coins 0 Coins 


BR ia 8: 


Figure 4: A snapshot from the experiment (taken from Tal 
et al. [28]). Blue is the most preferred candidate qj, thus if 
Blue wins, the participant gets 20 coins (10 cents). The bars 
show the poll scores s1, s2, 53. 


Grey 


We used this framework to generate datasets D32 and D36, pre- 
senting the participants with a different poll each time. All par- 
ticipants were recruited via the Amazon Mechanical Turk plat- 
form.The reward was R; for each round where q; was elected, 
where Rj > R2 > R3. For most of the participants, we set Rj = 
10¢, Ro = 5¢, R3 = 0¢. For some participants we varied the rewards. 

The dataset TS16 was generated by Tyzsler and Schram [30]. 
Here, every voting round was a 12-player complete information 
game with dictated preferences over 3 candidates, and the outcome 
was the result of all actual votes rather than artificial samples. We 
used the other 11 voters’ true top preferences (which are visible) as 
a true “poll" input to the different decision models. 


In all datasets, only when q; is ranked last at the poll, the voter 
may have a monetary incentive to vote for q2. There is never a 
monetary incentive to vote for q3. 


Random Forest (RF) Benchmark. We applied off-the-shelf ma- 
chine learning algorithm to build predictive models of voting behav- 
ior. We used two types of features: those relating to the particular 
voting round (examples: the gap s; — sz between the two leaders of 
the poll, the number of votes sz in the poll, the winning candidate in 
the poll); and those aggregating the behavior of the voter (examples: 
the frequency that the voter chose qi, gz and q3 in the training set, 
the frequency of a strategic compromise, and the number of domi- 
nated actions). Using these features, we compared the performance 
of black-box prediction models on the D32 dataset. 

We compared Random Forest, Neural Network, AdaBoost al- 
gorithms, CART (Decision Tree), Support Vector Machines and 
Logistics Regression.! The best performance was exhibited by a 
random forest ensemble model using 100 weak trees as subclassi- 
fiers, and a Gini splitting criterion. We thus used this algorithm 
(henceforth, RF) as our benchmark. 


Evaluation. We used a ten-fold cross validation method. We di- 
vided the data of each voter into 10 folds (when possible). KP has 
only three parameter values. For the other models, we discretized 
the parameter space. 

For each of the decision models KP, CV, AT, LD, LDLB and AU, 
we used a basic fitting procedure to train each model separately 
for each voter: 9 folds of data for this voter were used to fit the 


1We used the sklearn ensemble python package for this purpose [9]. The full list 
of features is available at https://github.com/AdamLauz/OneShotVoting/blob/master/ 
Documentation/One_Shot_ML_features_description.pdf. 
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Figure 5: Prediction error for each of the decision models on 
all four datasets. The horizontal orange lines mark the per- 
formance of the RF benchmark when trained on the entire 
data (solid) or restricted to individual data (dotted). 


parameters of the model, and applied the obtained model on the 
tenth fold to predict the voter’s actions. Since each voter has only 
few samples and the parameter space of each decision model is 
small, we used a brute-force search to find the best parameters for 
each model. For example, for the LD model, we found for each voter 
the parameter r such that ME? agrees with the largest number of 
rounds in the training set. 

The prediction error of a model is the number of wrong predic- 
tions on the test set, divided by the total number of rounds. 


5 RESULTS AND ANALYSIS 


Figure 5 shows the performance of all decision models on the 
datasets. We report the prediction error of the models, adding error 
bars of two standard deviations. 

We can see that the AU model outperforms all other decision 
models, with the LDLB model second, and the models that ignore 
leader bias (CV, LD) far behind. These results are statistically sig- 
nificant in all datasets (p < 0.05) except in TS16, where there was 
no significant difference between the performance of AU and AT. 


5.1 AU performance vs. the benchmark 


Random Forest (RF) uses many features and can create an arbitrarily 
complicated model, which learns from the entire population of 
hundreds of voters rather than just from several individual samples. 
In addition, it uses temporal features and can thus in principle 
predict even behavior that changes over time. This is why we use 
RF as a benchmark that is supposed to be hard to beat. Even so, RF 
does not perform uniformly better than the behavioral models. 

Figure 6 breaks down the error of the AU model by individual 
voters (we refer to the different colors in the next subsection). It 
demonstrates visually that AU beats RF for many individual voters 
(about a 100 out of 335), and that the advantage of RF is mainly due 
to a group of voters for which AU seems to perform substantially 
worse (those below the dashed line). Indeed it is possible that the AU 
model is appropriate for most voters but not for all (see discussion 
in Sec. 6.2). 
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Figure 6: Prediction error of AU versus RF for each voter. 
Only voters with at least 16 rounds are shown. The predic- 
tion of RF for voters below the dashed line was better by at 
least 10 percentage point. 


Another factor is the data used for learning: while AU and all 
other decision models fit their parameters for a particular voter 
solely based on her own behavior in other rounds, the black-box 
algorithms had access to votes of other voters as well. When re- 
stricted to learn only from the samples that belong to the same 
individual, the error of RF leaped dramatically (see dotted lines in 
Fig. 5). This indicates that information about the entire population 
could be exploited to further improve the behavioral models. In 
addition, the performance of RF (as well as the other black-box 
algorithms we tried) reduces more rapidly when we learn from a 
small or non-representative sample. 


5.2. Where are the errors? 


We analyze the factors that contributed most to prediction error, 
with a focus on the AU model. 


Some voters are harder to predict. We say that a candidate is 
dominated (in a particular round) if there is another candidate that is 
associated with a higher score in the poll and is also more preferred 
by the voter. E.g. q3 in Example 2.1 is dominated by q2. We count 
the number of dominated actions each voter performed throughout 
the experiment. Note that a dominated action is never predicted by 
any of the decision models we considered. It is hard to think of any 
rational justification for voting to a dominated candidate. We thus 
conjecture that dominated actions are indication for some random 
component in the behavior of the voter. 

We classified voters by the number of times they used a domi- 
nated action. In Fig. 6 we can see that the number of dominated 
actions substantially affects prediction accuracy not just for AU 
but also for the benchmark RF (and in fact for all models). The 
prediction error of AU for voters who completely avoid dominated 
actions is less than 18%, and increases to almost 50% for voters with 
more than 2 dominated actions, indicating that their behavior is al- 
most completely unpredictable. We emphasize that every additional 
dominated action results in more than one (about 1.6-3) predictions 
errors. This, together with the low performance of RF, corroborates 
our conjecture that dominated actions are merely an indication for 
noisy or random voting patterns. 
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Figure 7: A histogram of all D36 voters by their AU predic- 
tion error. We colored groups of voters for which certain con- 
ditions apply. 
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Figure 8: Prediction error as a function of the number of 
rounds in D36. 


Another factor that substantially affects prediction error is the 
number of rounds that a voter has played, where prediction error 
for voters who completed fewer rounds is much higher. See Fig. 8 
which lists error as a function of number of rounds per voter. A 
likely explanation is that these voters are more prone to overfitting. 

The histogram in Fig. 7 shows that for almost all voters where 
AU had high error, the reason was random behavior (indicated by 
dominated actions), or few rounds. 


Behavior in some polls is hard to predict. The behavior in polls 
that present the voter with an obvious dilemma (e.g., when her 
favorite candidate is trailing behind) is naturally harder to predict. 
In Table 1, we classified all polls into 6 poll types, based on the the 
order of candidates’ popularity in the poll. 


poll type D32 | D36 | TMG15 | TS16 

q1 > 42 > 3 | 0.085 | 0.113 | 0.076 | 0.047 

qi > 43 > q2 | 0.089 | 0.108 | 0.070 | 0.053 

q2 > 91 > 93 | 0.224 | 0.261 | 0.268 | 0.254 

q3 > 41 > q2 | 0.202 | 0.268 | 0.268 | 0.338 

42 > 93 > q2 | 0.233 | 0.250 | 0.258 | 0.296 
* 93 >q2>q | 0.363 | 0.403 | 0.419 | 0.470 
Table 1: AU error for each poll type. The order reflects the 
popularity of each candidate in the poll. 
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Figure 9: AU and AU, accuracy in D32. AU, is identical to 
AU, except that ¢ is used as another parameter. 


In the scenario most difficult to predict (where s(q3) > s(q2) > 
s(q1), marked with *), the poll order is reversed to the preference 
order of the candidates, and all three actions are frequently selected 
by the voters. For this case the prediction error of the AU model 
is above 35% (and remains high even if we focus on voters who 
played all rounds and avoided dominated actions). The results of the 
other models behaved similarly. Note that while in Fig. 5 prediction 
accuracy varies considerably between datasets, this is explained by 
the frequency of different poll types in each dataset (see Table 1). 

We emphasize that in TS16, most of the rounds people played 
as part of the majority group (see first two rows of Table 1), and 
thus faced a trivial decision where all models predicted the same. 
This could be the reason we did not obtain statistically significant 
results on this dataset, and also explains the poor performance of 
RF (which had few non-trivial rounds to learn from). 


Negative reward is harder to predict. In D32, we varied the re- 
ward R3, to see the effect of positive/zero/negative reward (see 
Fig. 9). Higher reward R3 results in higher accuracy (the only statis- 
tically significant difference was between negative and zero reward, 
as we only varied the reward for 25 participants). 

A closer look revealed the reason for the excess failures: with a 
negative reward, u3+¢ is still negative, and thus the AU model would 
never select q3. Perhaps surprisingly, participants do not care much 
about R3 being negative. Adding ¢ as a third optimization parameter 
that may get higher values (so that u3 + € > 0) completely negates 
that effect, as can be seen by the striped columns in Fig. 9. We also 
varied the reward R2 to get convex and concave utility scales rather 
than linear. While higher R2 does lead to more frequent votes to 
qz, we did not observe a consistent effect on prediction error. 


5.3 Subjective reporting by participants 


From each participant in our experiment (datasets D32 and D36), 
we asked to report their subjective answers about how well they 
understood the instructions; which strategy they used in the study; 
and whether they changed a strategy during the game. 


Descriptions of strategies. Some of the participants described 
strategies that are similar to the models we tested from the voting 
literature. Some primary examples appear below. 


e “Itried to vote for the person most likely to beat the candidate 
that would give me no coins." Describes KP with k = 2. 

e ‘T voted for either my first or second priority candidate. I was 
more likely to vote for the one that appeared to have the highest 
probability of winning." - Describe behavior similar to AT. 


e “My strategy was to mainly vote for who was leading except 
when it was a close race and then I voted for who would earn 
me the most points.” - describes LDLB with low r. 


Interestingly, people did not adhere to their reported strategies in 
all rounds, and often their behavior was predicted more accurately 
by a different model than the one they verbally describe. 

For example, some people who explicitly declared that they 
would not vote for the least preferred candidate did in fact choose 
this option. It is not clear whether this results from noisy behavior, 
from changing the behavior over time, or from poor self-reflection. 


Do voters use consistent strategies? Identifying individual changes 
in strategy from the data is very difficult with only a handful of 
samples per voter. However from the subjective self-reports, about 
63% of those who responded in D36, answered that they did not 
change their strategy, whereas only 18% did.” 

There was a strong correlation between how well people under- 
stood the instructions (by their self-report) and their consistency: in 
D36, more than 80% of those who reported perfect understanding, 
claimed they kept their strategy. Results in D32 were similar. 

Recall the questions from the introduction about consistency and 
predictability. The level of (reported) consistency strongly affected 
the empirical error: the average error of AU for consistent voters in 
D36 was about 24% vs. 30% for inconsistent ones (and 22% vs. 34% 
in D32). This can also be partly seen in Fig. 7, where voters who 
reported strategy change (dotted blue) are responsible for slightly 
more errors. 


6 DISCUSSION 


Regenwetter et al. [25] observe that “...individual choice research 


finds actors to behave worse than normative theory requires, whereas 


the sparse empirical research on social choice appears to suggest that 
electorates may outperform normative expectations.’ However, most 
research they refer to considered aggregated behavior, as discussed 
in the early sections. 

Our Attainability-Utility (AU) model explains well (and in par- 
ticular much better than calculus of voting) the behavior of most 
subjects in the data, except those with inherent inconsistencies 
in their actions. This partly settles the discrepancy observed by 
Regenwetter et al.: on the individual level, most voters follow AU 
or other heuristics that do not maximize expected utility, just like 
decision makers in other domains, even if on the aggregate level 
the vote distribution can be explained by more rational theories 
like calculus of voting [5, 14], or quantal response equilibrium [30]. 
Interestingly, quantal response can account for the frequency of 
dominated actions at the aggregate level, even if it cannot predict 
when a particular action will be dominated. We may therefore get a 
more complete picture of voters’ behavior by combining individual 
and aggregate analysis (see also future work below). 


6.1 Is AU cognitively plausible? 


There are two seemingly “irrational” components in the AU model 
(both inherited from Bowman [6]), that become apparent when 
we compare it to the “rational” Calculus of Voting method. The 


?The others provided an answer that could not be easily classified, e.g. “It took me a 
few rounds to get the hang of it.” 


first is the fact that the voter asses the chances of each candidate 
to win, rather than of each possible tie. The second is that this 
chance is estimated using a somewhat arbitrary transformation 
of the candidate’s score (the logit-shaped “attainability function’), 
rather than by explicit probabilistic calculations. 

Both observations are much less surprising when we recall Kah- 
nemann and Tversky’s account of judgment under uncertainty [29]: 
they explain that people often use simple substitutes for probabilis- 
tic calculation, that require low cognitive effort. For example, rely 
on how representative each event is (in our case, the score of each 
candidate in the poll).? Future experiments can further test this 
hypothesis by making candidates more prominent in other ways 
than higher score (e.g., using graphic features), and see if voters’ 
behavior can still be explained when we translate this to greater 
attainability. The extensive literature on the various heuristics peo- 
ple use to evaluate likely outcomes (e.g. [2, 10]) can also be used to 
develop better models of voting behavior. 

It is worth mentioning that trying to use a simple substitute for 
probabilistic calculations was the main motivation behind the Local 
Dominance model [19], but Local Dominance (like the Calculus of 
Voting) still focuses on ties. The fact that AU better explains the 
behavior of most voters (and in particular that LD fails to predict 
leader-biased actions) suggests that perhaps even LD is too cog- 
nitively prohibiting. Indeed, except for KP (which is perhaps too 
simple), AU is the cognitively easiest heuristic to apply, as it inde- 
pendently evaluates each candidate. We note that the differences 
in cognitive burden become even more accentuated in elections 
with more candidates. We therefore expect the differences in per- 
formance to become more significant as well, and are currently 
collecting more data to test this hypothesis. 


6.2 Are there voters of different types? 


While AU had the best performance overall, there are many individ- 
ual voters that are better predicted by one of the other models (not 
necessarily the leading one). This can be seen in the bottom bar in 
Figure 10 which shows the number of voters that were optimally 
predicted by each model. In case of ties we ‘split’ the voter among 
all leading models. To illustrate, although 129 voters were best pre- 
dicted by the AU heuristic, almost as many (120 voters) were best 
predicted by the KP heuristic. 

At this point we face a dilemma when trying to explain the 
reason that so many voters are better predicted by other models: 
One hypothesis (H1) is that AU can in principle account for the 
behavior of all voters, but is overfitting its parameters due to the 
small dataset of each voter. An alternative hypothesis (H2) is that 
there are indeed voters with different inherent behaviors that are 
better captured by other models, such as LDLB, KP and so on. 

Some evidence for H2 is in the self reports where participants 
described distinct strategies. However, we believe there is stronger 
evidence for H1: first, the self reports are often inconsistent with 
the actual behavior, and AU in fact predicts well many of the voters 
who described specific strategies. Also, AU can in principle explain 
(for some parameter values) almost all voters in the data, but we 
often fail to select the optimal parameters due to the small training 


3We tried a variation of the AU model, where attainability was replaced with the actual 
winning probability. This did not improve the performance of the model. 
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Figure 10: The bottom bar shows, for each model, the num- 
ber of participants in D32 and D36 for which this model 
achieved the best accuracy (possibly tied with other models). 
The top bar shows the same information, for the subset of 
voters for which the best prediction error was at most 0.2. 


sample so other models have fewer prediction errors. This can 
be seen in Fig. 8 where AU improves faster as voters have more 
samples to train on.Lastly, the advantage of AU becomes more clear 
once we focus on voters with low error (top bar in Fig. 10). These 
are the voters for which the selected model is more meaningful. 

To better answer this question, richer datasets that better distinct 
between decision models should be generated. 


6.3. Discussion and future work 


Finding a model that perfectly explains the behavior of all voters 
is probably impossible. Yet, our AU model does well both on the 
“behavioral” and on the “scientific” criteria presented in [19]: It is 
a fairly simple and cognitively plausible model, that captures the 
behavior of most voters well enough to predict their individual 
actions in various situations, and even to compete with machine- 
learning algorithms that use hundreds of features from the entire 
population. This model trades-off the popularity of a candidate (as 
a proxy for its winning chances) and its utility to the voter. 

Future voting models should be extended to allow behavior that 
changes over time in some predictable way. More importantly, 
deterministic decision models should be combined with stochastic 
ones like quantal response and trembling hand perfection [17, 23] to 
explain both consistent individual choices and random departures 
from those choices. New evaluation methods are needed for these 
combined aggregate and individual choices. 

Our findings can inform the development new and better models 
for strategic voting, much like the PrefLib project [15] is contribut- 
ing to the study of preference structure, as well as to inform the 
design of agents for making voting decisions with other people, 
which is a growing area of research [3, 34]. 

Most of the decision models we used, including the new AU 
heuristic, naturally extend to more candidates and other voting 
rules. We intend to run experiments in more diverse settings (e.g. 
more than 3 candidates). Those experiments can expose behaviors 
that do not exist in the current data, can help differentiate between 
the decision models, and serve as a benchmark for the development 
of new models. 
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